Comprehensive decoding mental processes from Web repositories of functional brain images

Associating brain systems with mental processes requires statistical analysis of brain activity across many cognitive processes. These analyses typically face a difficult compromise between scope—from domain-specific to system-level analysis—and accuracy. Using all the functional Magnetic Resonance Imaging (fMRI) statistical maps of the largest data repository available, we trained machine-learning models that decode the cognitive concepts probed in unseen studies. For this, we leveraged two comprehensive resources: NeuroVault—an open repository of fMRI statistical maps with unconstrained annotations—and Cognitive Atlas—an ontology of cognition. We labeled NeuroVault images with Cognitive Atlas concepts occurring in their associated metadata. We trained neural networks to predict these cognitive labels on tens of thousands of brain images. Overcoming the heterogeneity, imbalance and noise in the training data, we successfully decoded more than 50 classes of mental processes on a large test set. This success demonstrates that image-based meta-analyses can be undertaken at scale and with minimal manual data curation. It enables broad reverse inferences, that is, concluding on mental processes given the observed brain activity.

latter being the Human Connectome Project 42 (HCP), the largest fMRI cognitive study to date. We started by excluding images of the wrong modality, those whose brain coverage was too low, that are too heavily thresholded or whose values are unreasonable for contrast-effects statistical images (t-or z-statistics). This step yielded 54,000 unique maps.
Some of the maps we kept were missing part of the brain (at most 35%), often in areas that were not of interest for the considered experiment. We tried to impute those missing areas at component level, either by the median value or by sampling among the other maps, but this did not improve the decoding performance with respect to setting them to zero.

A.2.3 NeuroVault anomalies
Illustration of some map anomalies. Those 3 maps are labeled as regular fMRI statistical maps in NeuroVault but the first lacks 40% of the brain volume, the second has unreasonably large values and the third has only negative values.

A.2.4 Projection of fMRI data on dictionaries
Orthogonal projection over a dictionary. To compute the code X ∈ R n * c over the components of dictionary D ∈ R c * v from the original voxel maps X ∈ R n * v efficiently, with n the number of maps and v the number voxels in the grey matter mask, we multiplied the Moore-Penrose pseudo-inverse matrix of the dictionary (noted D † ) with the maps matrix, which simply yields Eq (12) as the dictionary's components are linearly independent (but not orthogonal).
These vectors of loadings over the brain components X can then be simply re-projected over the voxel space as X in Eq (13). For example, this is useful to visualize the brain areas that are more significant for a statistical model trained on the component space as in Supplementary Section A.7.

A.3.1 Basic labeling
We started by extracting the cognitive concepts from Cognitive Atlas that we matched exactly in the maps annotations, despite the wording differences between Cognitive Atlas and NeuroVault. For many images, the control conditions are not provided or cannot be reliably extracted from the metadata. Therefore for consistency we removed the control conditions for all the images. We removed from the annotations the text that is obviously related to a control condition (appearing after a "versus", "vs", ">"... in the contrast_definition or name fields). We also improved the annotations of the Human Connectome Project (HCP) study 42 (collection 4337) with previously existing rules that are reproduced in the repository (https://github.com/Parietal-INRIA/fmri_decoding). We removed the labels that are very rare (< 10 occurrences) as well as those whose occurrence is too correlated (|corr| > 0.95), as learning on scarce or overly correlated data may not yield meaningful results. The extracted set of labels for 29 000 maps of NeuroVault is supplied online https://github. com/Parietal-INRIA/fmri_decoding/tree/master/extracted_labels.

A.3.2 Enriched concepts
An exploration of Neurovault annotations illustrates some of the challenges in assigning cognitive labels by matching of Cognitive Atlas concepts found in the metadata. First, Cognitive Atlas names concepts using a specific vocabulary, with denominations that are often quite long. On the contrary, the annotations of NeuroVault are mostly unconstrained and uncurated. Some of them do not contain enough information to make sense of the experimental protocol and the studied cognitive concepts. For the others, the wording can differ a lot from Cognitive Atlas' and there is no validity or homogeneity guarantee. For example, many studies use a specific wording that differs from Cognitive Atlas, using "right hand", "r.hand" or even "RH" instead of right hand response execution. This limits the number of studies that we can use in the analysis.
Second, the annotations in the maps include some spurious labels, as the annotations in some major collections use words in their annotations that are related to the concept of interest of the overall experiment, instead of the exact contrast of the map. For example, 786 maps corresponding to shape recognition appear with the concept name emotion in their annotations, as the corresponding study uses them as a baseline in an emotional task. This introduces false-positives in the labels.
Last, there is structure between these concepts that we do not leverage in the first experiment. Some concepts have hypernymy relationships: a task involving auditory sentence comprehension should involve at least auditory sentence perception, auditory perception, perception, language comprehension and language as well. Some are also very close, even synonymous or at least often used interchangeably in NeuroVault annotations. Automatically discriminating them from open data seems unreasonable: we do not expect that the use of audition in NeuroVault annotations conveys a different meaning that the use of auditory perception. This last issue causes false negatives in the target labels.
Considering the relationships between concepts, we also define an ontology 43 for the encountered concepts -rather than using the relationships from Cognitive Atlas that we found too incomplete (see section A.3.3). This heuristic directed graph includes 27 synonymy and 158 hypernymy rules. A small part of this graph is illustrated in Supplementary Figure 2. We also applied those rules on the evaluation dataset to get a consistent structure between the labels for all the data. The list of synonymy and hypernymy rules are presented in tables 1 and 2.
Supplementary Figure 2. Missing concepts inference graph. Here we illustrate a part of the cognitive ontology used in this work, for some concepts related to language. The arrows show the inference directions. For example, whenever we assign the concept reading, we also add language and visual perception. Relationships between Cognitive Atlas concepts appear too incomplete to be used. Cognitive Atlas includes some of structure between its concepts, described in a rich graph database: concepts can have kind of or part of relationships. It also lists tasks designed to identify specific concepts. Yet, those relationships seem incomplete. Many concepts do not have any relationship and some obvious relations seem to be missing. For example, auditory sentence comprehension is not related to any other auditory concept and does not appear as being tested by the common language processing fMRI task paradigm whereas it should be.

A.4 Encoding method
Concept encoding with noise reduction. For a dataset {X, Y}, as a complement to our decoding goal of inferring the concepts y for any activation map over brain components x, we compute the encoding map x l for any concept l. This is usually done by fitting a generalized linear model β (GLM) such that X = Yβ + ε, where ε is the noise. Since the concept space is highly structured (for example, in the data used this work, Y perception ≈ Y audition + Y visual perception ), the design matrix Y can be ill-conditioned.
To better condition the matrix, we took inspiration from principal component regression (PCR, 44 ). A regular PCR would mix the concept of interest design vector with all the others in the principal components of the design matrix. To avoid that, we fitted a GLM for each concept and use as regressors the original design vector for the concept of interest combined with the principal components for the design matrix of all the other concepts.

A.5 Decoding performance of NNoD variants
We compare the performance for different models in this setting in Table 4. In this setting, the explored models have quite similar results. Still, the non-linear binary logistic model (a 2-layers perceptron) achieves a slightly better average AUC across concepts. This model is trained to minimize a binary logistic loss L bin with an elastic net (L1 + L2) regularization on the weights of both layers as in Eq (5), while applying dropout on the input and hidden layers. Training on the enriched dataset, as illustrated in Table 5, the explored architectures once again yield similar performance, with the non-linear binary logistic model yielding a slightly higher AUC.

A.6 Results on different data splits
We have performed additional experiments measuring the accuracy of terms identification on different studies. In Supplementary Figure 3 504,1964,2447,2606,2978,3235,3467,4022,4339,4341,4815,5802,6298,6299. The set of terms that can be tested varies with the test sets, but overall the accuracy is of the same order as with the IBC validation set. Figure 3. Accuracy of the classifier on other validation folds (ctd). Here we obtain AUC scores using the same procedure as in the main text, but using different validation sets to measure performance. (top) score is obtained using NeuroVault collections 1952 and 503; (bottom) score is obtained using NeuroVault collections 504,1964,2447,2606,2978,3235,3467,4022,4339,4341,4815,5802,6298,6299. The set of terms that can be tested varies with the test sets, but overall the accuracy is of the same order as with the IBC validation set.